AITopics | disease mention

Collaborating Authors

disease mention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Repurposing Annotation Guidelines to Instruct LLM Annotators: A Case Study

Kim, Kon Woo, Islamaj, Rezarta, Kim, Jin-Dong, Boudin, Florian, Aizawa, Akiko

arXiv.org Artificial IntelligenceOct-16-2025

This case study explores the potential of repurposing existing annotation guidelines to instruct a large language model (LLM) annotator in text annotation tasks. Traditional annotation projects invest significant resources--both time and cost--in developing comprehensive annotation guidelines. These are primarily designed for human annotators who will undergo training sessions to check and correct their understanding of the guidelines. While the results of the training are internalized in the human annotators, LLMs require the training content to be materialized. Thus, we introduce a method called moderation-oriented guideline repurposing, which adapts annotation guidelines to provide clear and explicit instructions through a process called LLM moderation. Using the NCBI Disease Corpus and its detailed guidelines, our experimental results demonstrate that, despite several remaining challenges, repurposing the guidelines can effectively guide LLM annotators. Our findings highlight both the promising potential and the limitations of leveraging the proposed workflow in automated settings, offering a new direction for a scalable and cost-effective refinement of annotation guidelines and the following annotation process.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-97144-0_13

2510.12835

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

RDMA: Cost Effective Agent-Driven Rare Disease Discovery within Electronic Health Record Systems

Wu, John, Cross, Adam, Sun, Jimeng

arXiv.org Artificial IntelligenceJul-23-2025

Rare diseases affect 1 in 10 Americans, yet standard ICD coding systems fail to capture these conditions in electronic health records (EHR), leaving crucial information buried in clinical notes. Current approaches struggle with medical abbreviations, miss implicit disease mentions, raise privacy concerns with cloud processing, and lack clinical reasoning abilities. We present Rare Disease Mining Agents (RDMA), a framework that mirrors how medical experts identify rare disease patterns in EHR. RDMA connects scattered clinical observations that together suggest specific rare conditions. By handling clinical abbreviations, recognizing implicit disease patterns, and applying contextual reasoning locally on standard hardware, RDMA reduces privacy risks while improving F1 performance by upwards of 30\% and decreasing inferences costs 10-fold. This approach helps clinicians avoid the privacy risk of using cloud services while accessing key rare disease information from EHR systems, supporting earlier diagnosis for rare disease patients. Available at https://github.com/jhnwu3/RDMA.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.15867

Country: North America > United States > Illinois (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Combining Domain-Specific Models and LLMs for Automated Disease Phenotyping from Survey Data

Beeri, Gal, Chamot, Benoit, Latchem, Elena, Venkatesh, Shruthi, Whalan, Sarah, Kruger, Van Zyl, Martino, David

arXiv.org Artificial IntelligenceDec-20-2024

Funding and support: The Generative AI Challenge is funded by grants from the Future Health Research and Innovation Fund (FHRIF), Grant ID IC2023-GAIA/11. Conflict of interest statement: The authors declare no conflicts of interest. Abstract This exploratory pilot study investigated the potential of combining a domain-specific model, BERN2, with large language models (LLMs) to enhance automated disease phenotyping from research survey data. Motivated by the need for efficient and accurate methods to harmonize the growing volume of survey data with standardized disease ontologies, we employed BERN2, a biomedical named entity recognition and normalization model, to extract disease information from the ORIGINS birth cohort survey data. After rigorously evaluating BERN2's performance against a manually curated ground truth dataset, we integrated various LLMs using prompt engineering, Retrieval-Augmented Generation (RAG), and Instructional Fine-Tuning (IFT) to refine the model's outputs. BERN2 demonstrated high performance in extracting and normalizing disease mentions, and the integration of LLMs, particularly with Few Shot Inference and RAG orchestration, further improved accuracy. This approach, especially when incorporating structured examples, logical reasoning prompts, and detailed context, offers a promising avenue for developing tools to enable efficient cohort profiling and data harmonization across large, heterogeneous research datasets. Introduction The increasing availability of research survey data from cohort studies and clinical trials offers unprecedented opportunities to advance biomedical research and improve healthcare (1).

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.20695

Country:

Oceania > Australia > Western Australia > Perth (0.05)
Oceania > Australia > Western Australia > Joondalup (0.04)
Oceania > Australia > Western Australia > Nedlands (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions

Sasse, Kuleen, Vadlakonda, Shinjitha, Kennedy, Richard E., Osborne, John D.

arXiv.org Artificial IntelligenceOct-10-2024

Background: Machine learning methods for clinical named entity recognition and entity normalization systems can utilize both labeled corpora and Knowledge Graphs (KGs) for learning. However, infrequently occurring concepts may have few mentions in training corpora and lack detailed descriptions or synonyms, even in large KGs. For Disease Entity Recognition (DER) and Disease Entity Normalization (DEN), this can result in fewer high quality training examples relative to the number of known diseases. Large Language Model (LLM) generation of synthetic training examples could improve performance in these information extraction tasks. Methods: We fine-tuned a LLaMa-2 13B Chat LLM to generate a synthetic corpus containing normalized mentions of concepts from the Unified Medical Language System (UMLS) Disease Semantic Group. We measured overall and Out of Distribution (OOD) performance for DER and DEN, with and without synthetic data augmentation. We evaluated performance on 3 different disease corpora using 4 different data augmentation strategies, assessed using BioBERT for DER and SapBERT and KrissBERT for DEN. Results: Our synthetic data yielded a substantial improvement for DEN, in all 3 training corpora the top 1 accuracy of both SapBERT and KrissBERT improved by 3-9 points in overall performance and by 20-55 points in OOD data. A small improvement (1-2 points) was also seen for DER in overall performance, but only one dataset showed OOD improvement. Conclusion: LLM generation of normalized disease mentions can improve DEN relative to normalization approaches that do not utilize LLMs to augment data with synthetic mentions. Ablation studies indicate that performance gains for DEN were only partially attributable to improvements in OOD performance. The same approach has only a limited ability to improve DER. We make our software and dataset publicly available.

computational linguistic, dataset, supplemental 0, (13 more...)

arXiv.org Artificial Intelligence

2410.07951

Country:

North America > United States > Alabama (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (0.47)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification

Wu, Jinge, Dong, Hang, Li, Zexi, Patra, Arijit, Wu, Honghan

arXiv.org Artificial IntelligenceMay-16-2024

The infrequency and heterogeneity of clinical presentations in rare diseases often lead to underdiagnosis and their exclusion from structured datasets. This necessitates the utilization of unstructured text data for comprehensive analysis. However, the manual identification from clinical reports is an arduous and intrinsically subjective task. This study proposes a novel hybrid approach that synergistically combines a traditional dictionary-based natural language processing (NLP) tool with the powerful capabilities of large language models (LLMs) to enhance the identification of rare diseases from unstructured clinical notes. We comprehensively evaluate various prompting strategies on six large language models (LLMs) of varying sizes and domains (general and medical). This evaluation encompasses zero-shot, few-shot, and retrieval-augmented generation (RAG) techniques to enhance the LLMs' ability to reason about and understand contextual information in patient reports. The results demonstrate effectiveness in rare disease identification, highlighting the potential for identifying underdiagnosed patients from clinical notes.

identification, rare disease, rare disease identification, (14 more...)

arXiv.org Artificial Intelligence

2405.1044

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources

Hansen, Lasse Hyldig, Andersen, Nikolaj, Gallifant, Jack, McCoy, Liam G., Stone, James K, Izath, Nura, Aguirre-Jerez, Marcela, Bitterman, Danielle S, Gichoya, Judy, Celi, Leo Anthony

arXiv.org Artificial IntelligenceMay-8-2024

Background Advancements in Large Language Models (LLMs) hold transformative potential in healthcare, however, recent work has raised concern about the tendency of these models to produce outputs that display racial or gender biases. Although training data is a likely source of such biases, exploration of disease and demographic associations in text data at scale has been limited. Methods We conducted a large-scale textual analysis using a dataset comprising diverse web sources, including Arxiv, Wikipedia, and Common Crawl. The study analyzed the context in which various diseases are discussed alongside markers of race and gender. Given that LLMs are pre-trained on similar datasets, this approach allowed us to examine the potential biases that LLMs may learn and internalize. We compared these findings with actual demographic disease prevalence as well as GPT-4 outputs in order to evaluate the extent of bias representation. Results Our findings indicate that demographic terms are disproportionately associated with specific disease concepts in online texts. gender terms are prominently associated with disease concepts, while racial terms are much less frequently associated. We find widespread disparities in the associations of specific racial and gender terms with the 18 diseases analyzed. Most prominently, we see an overall significant overrepresentation of Black race mentions in comparison to population proportions. Conclusions Our results highlight the need for critical examination and transparent reporting of biases in LLM pretraining datasets. Our study suggests the need to develop mitigation strategies to counteract the influence of biased training data in LLMs, particularly in sensitive domains such as healthcare.

dataset, disease mention, keyword, (14 more...)

arXiv.org Artificial Intelligence

2405.05049

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Africa > Uganda > Western Region > Mbarara District (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System

Villena, Fabián, Rojas, Matías, Arias, Felipe, Pacheco, Jorge, Vera, Paulina, Dunstan, Jocelyn

arXiv.org Artificial IntelligenceJul-9-2023

The disease coding task involves assigning a unique identifier from a controlled vocabulary to each disease mentioned in a clinical document. This task is relevant since it allows information extraction from unstructured data to perform, for example, epidemiological studies about the incidence and prevalence of diseases in a determined context. However, the manual coding process is subject to errors as it requires medical personnel to be competent in coding rules and terminology. In addition, this process consumes a lot of time and energy, which could be allocated to more clinically relevant tasks. These difficulties can be addressed by developing computational systems that automatically assign codes to diseases. In this way, we propose a two-step system for automatically coding diseases in referrals from the Chilean public healthcare system. Specifically, our model uses a state-of-the-art NER model for recognizing disease mentions and a search engine system based on Elasticsearch for assigning the most relevant codes associated with these disease mentions. The system's performance was evaluated on referrals manually coded by clinical experts. Our system obtained a MAP score of 0.63 for the subcategory level and 0.83 for the category level, close to the best-performing models in the literature. This system could be a support tool for health professionals, optimizing the coding and management process. Finally, to guarantee reproducibility, we publicly release the code of our models and experiments.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.0556

Country:

South America > Chile (0.16)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(2 more...)

Add feedback

Overview of BioASQ 2022: The tenth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Nentidis, Anastasios, Katsimpras, Georgios, Vandorou, Eirini, Krithara, Anastasia, Miranda-Escalada, Antonio, Gasco, Luis, Krallinger, Martin, Paliouras, Georgios

arXiv.org Artificial IntelligenceOct-13-2022

This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of the three established tasks a, b, and Synergy, and a new task named DisTEMIST for automatic semantic annotation and grounding of diseases from clinical content in Spanish, a key concept for semantic indexing and search engines of literature and clinical records. This year, BioASQ received more than 170 distinct systems from 38 teams in total for the four different tasks of the challenge. As in previous years, the majority of the competing systems outperformed the strong baselines, indicating the continuous advancement of the state-of-the-art in this domain.

information retrieval, machine learning, question answering, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-13643-6_22

2210.06852

Country:

Europe > Portugal > Aveiro > Aveiro (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
South America > Argentina (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (0.86)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets

Verma, Harsh, Bagherzadeh, Parsa, Bergler, Sabine

arXiv.org Artificial IntelligenceSep-12-2022

The simplicity of this pipeline This paper summarizes the CLaC submission and its knowledge injection from readily available for SMM4H 2022 Task 10 which domain resources rather than training purely from concerns the recognition of diseases mentioned training data make our system's strength.

disease name, gazetteer, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2209.03528

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Medical Entity Linking using Triplet Network

Mondal, Ishani, Purkayastha, Sukannya, Sarkar, Sudeshna, Goyal, Pawan, Pillai, Jitesh, Bhattacharyya, Amitava, Gattu, Mahanandeeshwar

arXiv.org Artificial IntelligenceDec-21-2020

Entity linking (or Normalization) is an essential task in text mining that maps the entity mentions in the medical text to standard entities in a given Knowledge Base (KB). This task is of great importance in the medical domain. It can also be used for merging different medical and clinical ontologies. In this paper, we center around the problem of disease linking or normalization. This task is executed in two phases: candidate generation and candidate scoring. In this paper, we present an approach to rank the candidate Knowledge Base entries based on their similarity with disease mention. We make use of the Triplet Network for candidate ranking. While the existing methods have used carefully generated sieves and external resources for candidate generation, we introduce a robust and portable candidate generation scheme that does not make use of the hand-crafted rules. Experimental results on the standard benchmark NCBI disease dataset demonstrate that our system outperforms the prior methods by a significant margin.

disease mention, infection, triplet network, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/W19-1912

2012.11164

Country:

Asia > India > West Bengal > Kharagpur (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)

Add feedback